NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Extending Quantum Perceptrons: Rydberg Devices, Multi-Class Classification, and Error Tolerance

Agarwal, Ishita; Patti, Taylor_L; Araiza_Bravo, Rodrigo; Yelin, Susanne_F; Anandkumar, Anima (November 2024, arXivorg)

Quantum Neuromorphic Computing (QNC) merges quantum computation with neural computation to create scalable, noise-resilient algorithms for quantum machine learning (QML). At the core of QNC is the quantum perceptron (QP), which leverages the analog dynamics of interacting qubits to enable universal quantum computation. Canonically, a QP features input qubits and one output qubit, and is used to determine whether an input state belongs to a specific class. Rydberg atoms, with their extended coherence times and scalable spatial configurations, provide an ideal platform for implementing QPs. In this work, we explore the implementation of QPs on Rydberg atom arrays, assessing their performance in tasks such as phase classification between Z2, Z3, Z4 and disordered phases, achieving high accuracy, including in the presence of noise. We also perform multi-class entanglement classification by extending the QP model to include multiple output qubits, achieving 95\% accuracy in distinguishing noisy, high-fidelity states based on separability. Additionally, we discuss the experimental realization of QPs on Rydberg platforms using both single-species and dual-species arrays, and examine the error bounds associated with approximating continuous functions.
more » « less
Full Text Available
A text-guided protein design framework

https://doi.org/10.1038/s42256-025-01011-z

Liu, Shengchao; Li, Yanjing; Li, Zhuoxinran; Gitter, Anthony; Zhu, Yutao; Lu, Jiarui; Xu, Zhao; Nie, Weili; Ramanathan, Arvind; Xiao, Chaowei; et al (March 2025, Nature Machine Intelligence)

Current AI-assisted protein design utilizes mainly protein sequential and structural information. Meanwhile, there exists tremendous knowledge curated by humans in text format describing proteins’ high-level functionalities, yet whether the incorporation of such text data can help in protein design tasks has not been explored. To bridge this gap, we propose ProteinDT, a multimodal framework that leverages textual descriptions for protein design. ProteinDT consists of three consecutive steps: ProteinCLAP, which aligns the representation of two modalities, a facilitator that generates the protein representation from the text modality and a decoder that creates the protein sequences from the representation. To train ProteinDT, we construct a large dataset, SwissProtCLAP, with 441,000 text and protein pairs. We quantitatively verify the effectiveness of ProteinDT on three challenging tasks: (1) over 90% accuracy for text-guided protein generation; (2) best hit ratio on 12 zero-shot text-guided protein editing tasks; (3) superior performance on four out of six protein property prediction benchmarks.
more » « less
Free, publicly-accessible full text available March 27, 2026
ARDuP: Active Region Video Diffusion for Universal Policies

https://doi.org/10.1109/IROS58592.2024.10802264

Huang, Shuaiyi; Levy, Mara; Jiang, Zhenyu; Anandkumar, Anima; Zhu, Yuke; Fan, Linxi; Huang, De-An; Shrivastava, Abhinav (October 2024, IEEE)

Full Text Available
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Zhao, Jiawei; Zhang, Zhenyu; Chen, Beidi; Wang, Zhangyang; Anandkumar, Anima; Tian, Yuandong (July 2024, International Conference on Machine Learning (ICML))

Training Large Language Models (LLMs) presents significant memory challenges, predominantly due to the growing size of weights and optimizer states. Common memory-reduction approaches, such as low-rank adaptation (LoRA), add a trainable low-rank matrix to the frozen pre-trained weight in each layer, reducing trainable parameters and optimizer states. However, such approaches typically underperform training with full-rank weights in both pre-training and fine-tuning stages since they limit the parameter search to a low-rank subspace and alter the training dynamics, and further, may require full-rank warm start. In this work, we propose Gradient Low-Rank Projection (GaLore), a training strategy that allows full-parameter learning but is more memory-efficient than common low-rank adaptation methods such as LoRA. Our approach reduces memory usage by up to 65.5% in optimizer states while maintaining both efficiency and performance for pre-training on LLaMA 1B and 7B architectures with C4 dataset with up to 19.7B tokens, and on fine-tuning RoBERTa on GLUE tasks. Our 8-bit GaLore further reduces optimizer memory by up to 82.5% and total training memory by 63.3%, compared to a BF16 baseline. Notably, we demonstrate, for the first time, the feasibility of pre-training a 7B model on consumer GPUs with 24GB memory (e.g., NVIDIA RTX 4090) without model parallel, checkpointing, or offloading strategies.
more » « less
Full Text Available
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Zhao, Jiawei; Zhang, Zhenyu; Chen, Beidi; Wang, Zhangyang; Anandkumar, Anima; Tian, Yuandong (July 2024, International Conference on Machine Learning (ICML))

Training Large Language Models (LLMs) presents significant memory challenges, predominantly due to the growing size of weights and optimizer states. Common memory-reduction approaches, such as low-rank adaptation (LoRA), add a trainable low-rank matrix to the frozen pre-trained weight in each layer, reducing trainable parameters and optimizer states. However, such approaches typically underperform training with full-rank weights in both pre-training and fine-tuning stages since they limit the parameter search to a low-rank subspace and alter the training dynamics, and further, may require full-rank warm start. In this work, we propose Gradient Low-Rank Projection (GaLore), a training strategy that allows full-parameter learning but is more memory-efficient than common low-rank adaptation methods such as LoRA. Our approach reduces memory usage by up to 65.5% in optimizer states while maintaining both efficiency and performance for pre-training on LLaMA 1B and 7B architectures with C4 dataset with up to 19.7B tokens, and on fine-tuning RoBERTa on GLUE tasks. Our 8-bit GaLore further reduces optimizer memory by up to 82.5% and total training memory by 63.3%, compared to a BF16 baseline. Notably, we demonstrate, for the first time, the feasibility of pre-training a 7B model on consumer GPUs with 24GB memory (e.g., NVIDIA RTX 4090) without model parallel, checkpointing, or offloading strategies.
more » « less
Full Text Available
Sum-of-Squares inspired Quantum Metaheuristic for Polynomial Optimization with the Hadamard Test and Approximate Amplitude Constraints

Wang, Iria W; Brown, Robin; Patti, Taylor L; Anandkumar, Anima; Pavone, Marco; Yelin, Sussane F (August 2024, arXivorg)

Quantum computation shows promise for addressing numerous classically intractable problems, such as optimization tasks. Many optimization problems are NP-hard, meaning that they scale exponentially with problem size and thus cannot be addressed at scale by traditional computing paradigms. The recently proposed quantum algorithm arXiv:2206.14999 addresses this challenge for some NP-hard problems, and is based on classical semidefinite programming (SDP). In this manuscript, we generalize the SDP-inspired quantum algorithm to sum-of-squares programming, which targets a broader problem set. Our proposed algorithm addresses degree- polynomial optimization problems with variables (which are representative of many NP-hard problems) using qubits, quantum measurements, and classical calculations. We apply the proposed algorithm to the prototypical Max-SAT problem and compare its performance against classical sum-of-squares, state-of-the-art heuristic solvers, and random guessing. Simulations show that the performance of our algorithm surpasses that of classical sum-of-squares after rounding. Our results further demonstrate that our algorithm is suitable for large problems and approximates the best known classical heuristics, while also providing a more generalizable approach compared to problem-specific heuristics.
more » « less
Full Text Available
PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees

Xie, Chulin; Huang, De-An; Chu, Wenda; Xu, Daguang; Xiao, Chaowei; Li, Bo; Anandkumar, Anima (June 2024, Computer Vision and Pattern Recognition Conference (CVPR 2024))

Personalized Federated Learning (pFL) has emerged as a promising solution to tackle data heterogeneity across clients in FL. However, existing pFL methods either (1) introduce high computation and communication costs or (2) overfit to local data, which can be limited in scope and vulnerable to evolved test samples with natural distribution shifts. In this paper, we propose PERADA, a parameter-efficient pFL framework that reduces communication and computational costs and exhibits superior generalization performance, especially under test-time distribution shifts. PERADA reduces the costs by leveraging the power of pretrained models and only updates and communicates a small number of additional parameters from adapters. PERADA achieves high generalization by regularizing each client’s personalized adapter with a global adapter, while the global adapter uses knowledge distillation to aggregate generalized information from all clients. Theoretically, we provide generalization bounds of PERADA, and we prove its convergence to stationary points under non-convex settings. Empirically, PERADA demonstrates higher personalized performance (+4.85% on CheXpert) and enables better out-of-distribution generalization (+5.23% on CIFAR-10-C) on different datasets across natural and medical domains compared with baselines, while only updating 12.6% of parameters per model. Our code is available at https://github.com/NVlabs/PerAda.
more » « less
Full Text Available
Equivariant graph neural operator for modeling 3d dynamics

Xu, Minkai; Han, Jiaqi; Lou, Aaron; Kossaifi, Jean; Ramanathan, Arvind; Azizzadenesheli, Kamyar; Leskovec, Jure; Ermon, Stefano; Anandkumar, Anima (May 2024, International Conference on Machine Learning)

Full Text Available
Near-term distributed quantum computation using mean-field corrections and auxiliary qubits

https://doi.org/10.1088/2058-9565/ad3f45

McClain Gomez, Abigail; Patti, Taylor L.; Anandkumar, Anima; Yelin, Susanne F. (May 2024, Quantum Science and Technology)

Abstract Distributed quantum computation is often proposed to increase the scalability of quantum hardware, as it reduces cooperative noise and requisite connectivity by sharing quantum information between distant quantum devices. However, such exchange of quantum information itself poses unique engineering challenges, requiring high gate fidelity and costly non-local operations. To mitigate this, we propose near-term distributed quantum computing, focusing on approximate approaches that involve limited information transfer and conservative entanglement production. We first devise an approximate distributed computing scheme for the time evolution of quantum systems split across any combination of classical and quantum devices. Our procedure harnesses mean-field corrections and auxiliary qubits to link two or more devices classically, optimally encoding the auxiliary qubits to both minimize short-time evolution error and extend the approximate scheme’s performance to longer evolution times. We then expand the scheme to include limited quantum information transfer through selective qubit shuffling or teleportation, broadening our method’s applicability and boosting its performance. Finally, we build upon these concepts to produce an approximate circuit-cutting technique for the fragmented pre-training of variational quantum algorithms. To characterize our technique, we introduce a non-linear perturbation theory that discerns the critical role of our mean-field corrections in optimization and may be suitable for analyzing other non-linear quantum techniques. This fragmented pre-training is remarkably successful, reducing algorithmic error by orders of magnitude while requiring fewer iterations.
more » « less
Stability Constrained Reinforcement Learning for Decentralized Real-Time Voltage Control

https://doi.org/10.1109/TCNS.2023.3338240

Feng, Jie; Shi, Yuanyuan; Qu, Guannan; Low, Steven H.; Anandkumar, Anima; Wierman, Adam (January 2024, IEEE Transactions on Control of Network Systems)

Full Text Available

« Prev Next »

Search for: All records